Generation#

The generation module provides functionality for text generation using language models with distortion-guided beam search. It extends the capabilities of standard beam search by incorporating distortion probabilities and observed sequences.

Key Components#

  • token_transformation_to_probs: Transforms observed sequences into token indices and probabilities.

  • get_distortion_probs: Computes distortion probabilities for a batch of observed sequences.

  • distortion_probs_to_cuda: Transfers distortion probabilities to a CUDA tensor.

  • distortion_guided_beam_search: Implements the main beam search algorithm with distortion guidance.

API Documentation#

lmcsc.generation.token_transformation_to_probs(self, observed_sequence: str) Tuple[List[int], List[float], dict][source]#

Transforms an observed sequence into token indices and their corresponding probabilities.

Parameters:

observed_sequence (str) – The input sequence to be transformed.

Returns:

A tuple containing:
  • List of token indices.

  • List of corresponding probabilities.

  • Dictionary of original token lengths.

Return type:

Tuple[List[int], List[float], dict]

lmcsc.generation.get_distortion_probs(self, batch_observed_sequences: List[List[str]], eos_token_id: int) Tuple[List[int], List[int], List[int], List[float], List[List[dict]], List[bool]][source]#

Computes distortion probabilities for a batch of observed sequences.

Parameters:
  • batch_observed_sequences (List[List[str]]) – A batch of observed sequences.

  • eos_token_id (int) – The end-of-sequence token ID.

Returns:

A tuple containing:
  • List of batch indices.

  • List of beam indices.

  • List of token indices.

  • List of distortion probabilities.

  • List of original token lengths for each beam.

  • List of boolean values indicating if EOS is forced.

Return type:

Tuple[List[int], List[int], List[int], List[float], List[List[dict]], List[bool]]

A modified beam search function for CSC.

Notes

This code is based on the beam_search function in the transformers library. We make 5 modifications to the original code:

  1. Initialization.

  2. Intervention of decoding process.

  3. Update the observed sequences.

  4. Remove stopping_criteria.

  5. Put the generated results into Streamer.

You can search ## Modification X.* in the code to find the corresponding part.

Parameters:
  • observed_sequence_generator (BaseObversationGenerator) – An instance of [BaseObversationGenerator] that defines how observed sequences are generated.

  • beam_scorer (BeamScorer) – An derived instance of [BeamScorer] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [BeamScorer] should be read.

  • input_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) – The sequence used as a prompt for the generation.

  • logits_processor (LogitsProcessorList, optional) – An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

  • stopping_criteria (StoppingCriteriaList, optional) – An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

  • max_length (int, optional, defaults to 20) – DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

  • pad_token_id (int, optional) – The id of the padding token.

  • eos_token_id (Union[int, List[int]], optional) – The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

  • output_attentions (bool, optional, defaults to False) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

  • output_hidden_states (bool, optional, defaults to False) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

  • output_scores (bool, optional, defaults to False) – Whether or not to return the prediction scores. See scores under returned tensors for more details.

  • return_dict_in_generate (bool, optional, defaults to False) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

  • synced_gpus (bool, optional, defaults to False) – Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

  • model_kwargs – Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Returns:

[generation.GenerateBeamDecoderOnlyOutput], [~generation.GenerateBeamEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GenerateBeamDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GenerateBeamEncoderDecoderOutput] if model.config.is_encoder_decoder=True.

A modified beam search function for CSC.

Notes

This code is based on the beam_search function in the transformers library. We make 5 modifications to the original code:

  1. Initialization.

  2. Intervention of decoding process.

  3. Update the observed sequences.

  4. Remove stopping_criteria.

  5. Put the generated results into Streamer.

You can search ## Modification X.* in the code to find the corresponding part.

Parameters:
  • observed_sequence_generator (BaseObversationGenerator) – An instance of [BaseObversationGenerator] that defines how observed sequences are generated.

  • beam_scorer (BeamScorer) – An derived instance of [BeamScorer] that defines how beam hypotheses are constructed, stored and sorted during generation. For more information, the documentation of [BeamScorer] should be read.

  • input_ids (torch.LongTensor of shape (batch_size, sequence_length), optional) – The sequence used as a prompt for the generation.

  • logits_processor (LogitsProcessorList, optional) – An instance of [LogitsProcessorList]. List of instances of class derived from [LogitsProcessor] used to modify the prediction scores of the language modeling head applied at each generation step.

  • stopping_criteria (StoppingCriteriaList, optional) – An instance of [StoppingCriteriaList]. List of instances of class derived from [StoppingCriteria] used to tell if the generation loop should stop.

  • max_length (int, optional, defaults to 20) – DEPRECATED. Use logits_processor or stopping_criteria directly to cap the number of generated tokens. The maximum length of the sequence to be generated.

  • pad_token_id (int, optional) – The id of the padding token.

  • eos_token_id (Union[int, List[int]], optional) – The id of the end-of-sequence token. Optionally, use a list to set multiple end-of-sequence tokens.

  • output_attentions (bool, optional, defaults to False) – Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more details.

  • output_hidden_states (bool, optional, defaults to False) – Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more details.

  • output_scores (bool, optional, defaults to False) – Whether or not to return the prediction scores. See scores under returned tensors for more details.

  • return_dict_in_generate (bool, optional, defaults to False) – Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

  • synced_gpus (bool, optional, defaults to False) – Whether to continue running the while loop until max_length (needed for ZeRO stage 3)

  • model_kwargs – Additional model specific kwargs will be forwarded to the forward function of the model. If model is an encoder-decoder model the kwargs should include encoder_outputs.

Returns:

[generation.GenerateBeamDecoderOnlyOutput], [~generation.GenerateBeamEncoderDecoderOutput] or torch.LongTensor: A torch.LongTensor containing the generated tokens (default behaviour) or a [~generation.GenerateBeamDecoderOnlyOutput] if model.config.is_encoder_decoder=False and return_dict_in_generate=True or a [~generation.GenerateBeamEncoderDecoderOutput] if model.config.is_encoder_decoder=True.